Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: KIND complex network scenarios #1337

Closed
wants to merge 8 commits into from

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Feb 17, 2020

There were several PR and demand to implement this in KIND.
However, I think that KIND can serve better as a building block for complex scenarios that can be easily scripted, avoiding adding complexity to the project.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2020
@aojea
Copy link
Contributor Author

aojea commented Feb 17, 2020

/assign @BenTheElder
/cc @howardjohn @qinqon @neiljerram

@k8s-ci-robot
Copy link
Contributor

@aojea: GitHub didn't allow me to request PR reviews from the following users: neiljerram, howardjohn, qinqon.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @BenTheElder
/cc @howardjohn @qinqon @neiljerram

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@nelljerram nelljerram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of typos that I spotted.


## Multiple clusters

As we explained before, all KIND clusters are sahring the same docker network, that means that all the cluster nodes have direct connectivity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "sahring"


As we explained before, all KIND clusters are sahring the same docker network, that means that all the cluster nodes have direct connectivity.

If we want to spawn multiple cluster and provide Pod to Pod connectivity between different clusters, first we have to configure the cluster networking parameters to avoid address overlapping.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"multiple clusters"

Copy link
Member

@tao12345666333 tao12345666333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
{{< /codeFromInline >}}

That means that Pods will be able to reach other dockers containers that does not belong to any KIND cluster, however, the docker container will not be able to answer to the Pod IP address until we intall the correspoding routes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reach other dockers containers

maybe should change to:

reach other docker containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same doubt, but with the amount of "containers", pods, ... that we have in these virtualized environments I think that maybe is good be explicit about this

Copy link
Member

@tao12345666333 tao12345666333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea, tao12345666333
To complete the pull request process, please assign bentheelder
You can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nelljerram
Copy link

@aojea Would this be a good time to talk more about your comment at #939 (comment) ? I understand that you and @BenTheElder had concerns about my proposal at the time, but I am not sure you are right that my objective can be easily achieved in existing other ways.

@aojea
Copy link
Contributor Author

aojea commented Feb 17, 2020

@aojea Would this be a good time to talk more about your comment at #939 (comment) ? I understand that you and @BenTheElder had concerns about my proposal at the time, but I am not sure you are right that my objective can be easily achieved in existing other ways.

/hold

@neiljerram my understanding is that you want to automate in KIND:

  • attaching new networks to the KIND nodes
  • adding custom static routes to the KIND nodes
  • adding loopback interfaces with custom IPs to the KIND nodes

if that's correct I can document how to do it, it seems easy to automate with a script or in the same way that kubeadm friends are doing with kinder https://github.com/kubernetes/kubeadm/tree/master/kinder#usage

IMHO that seems a very intrusive and specific change to target multihomed nodes environments, that are not very common on cloud environments ... bear in mind that main goal of KIND is testing Kubernetes.
Personally, what I'm more afraid of is about having more dependencies on libnetwork, is really opinionated for things like IPv6 or DNS behavior, I don know what those docker network connect commands will change ...
However, @BenTheElder may have another opinion ...

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 17, 2020
@nelljerram
Copy link

nelljerram commented Feb 17, 2020

Thanks @aojea for your interest in this.

At the time of that PR, I was modelling a multi-homed infrastructure, with two independent planes of connectivity between any two nodes. Obviously the idea is that if one of the connectivity planes fails in some way, we still have connectivity between all the nodes over the other plane.

My reason for thinking that this needs integration in KIND is as follows.

  • For a setup like this to be resilient, it is important that connections to or from a node do not use an interface-specific address as their source or destination IP. Because then the connection cannot continue if the connectivity fails adjacent to that interface-specific address. So instead we want to set up a "loopback address" on each node and arrange for all outgoing connections to use that address.

  • That includes the connections that are established for the functioning of the Kubernetes control plane itself. For example each node's kubelet connecting to the API server should have src IP = loopback address of kubelet node.

  • Therefore, I think, the provisioning of the loopback address, and the routing to loopback addresses on other nodes, must be done as part of the KIND cluster setup.

WDYT? Am I still missing other possible approaches here?

@aojea
Copy link
Contributor Author

aojea commented Feb 18, 2020

/hold cancel

Thanks @aojea for your interest in this.

At the time of that PR, I was modelling a multi-homed infrastructure, with two independent planes of connectivity between any two nodes. Obviously the idea is that if one of the connectivity planes fails in some way, we still have connectivity between all the nodes over the other plane.

My reason for thinking that this needs integration in KIND is as follows.

  • For a setup like this to be resilient, it is important that connections to or from a node do not use an interface-specific address as their source or destination IP. Because then the connection cannot continue if the connectivity fails adjacent to that interface-specific address. So instead we want to set up a "loopback address" on each node and arrange for all outgoing connections to use that address.
  • That includes the connections that are established for the functioning of the Kubernetes control plane itself. For example each node's kubelet connecting to the API server should have src IP = loopback address of kubelet node.
  • Therefore, I think, the provisioning of the loopback address, and the routing to loopback addresses on other nodes, must be done as part of the KIND cluster setup.

WDYT? Am I still missing other possible approaches here?

yeah, I totally understand your point from the Network engineering perspective, but that setup needs a routing protocol to work and do the failover, I know that Calico and kube-router gives that possibility allowing you to peer with the leaf switches, but as I've said before this is a very specific scenario for bare metal environments, where you don't have an IaaS handling the infrastructure.

For "cloud-native" environments, the IaaS + cloud-controller-manager and Kubernetes + controller loops handle the "resilience" of the environment, i.e. the VMs only need one interface because the network is "virtual" and the IaaS handles it, for the Kubernetes workloads the controller loops handle the pods and services, restarts containers that fail, replaces containers, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve. Basically everything is cattle .... or should be 😉

Specifically to KIND, the network is a Linux bridge, everything is SW and in the same host, if one interface or bridge fails we'll have a bigger problem 😄

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 18, 2020
@aojea
Copy link
Contributor Author

aojea commented Feb 18, 2020

/retest

@nelljerram
Copy link

@aojea Many thanks. So if I understand correctly, I think your position can be summarised as:

  • You agree that KIND-level support would be needed, for someone to correctly model that topology using KIND.

  • But you don't want to add the complexity for that to KIND, because you don't think it's an important enough use case for the Kubernetes community.

Is that right?

@aojea
Copy link
Contributor Author

aojea commented Feb 18, 2020

  • You agree that KIND-level support would be needed, for someone to correctly model that topology using KIND.

my point is that I don't see the need to implement it in KIND because you can do it just after the cluster creation, this is an example with bash, using python, go , ... you can easily build much more complex topologies and parametrize it:

LOOPBACK_PREFIX="1.1.1."
MY_BRIDGE="my_net2"
MY_ROUTE=10.0.0.0/24
MY_GW=172.16.17.1
# Create 2nd network
docker network create ${MY_BRIDGE}
# Create kubernetes cluster
kind create cluster
# Configure nodes to use the second network
for n in $(kind get nodes); do
  # Connect the node to the second network
  docker network connect ${MY_BRIDGE} ${n}
  # Configure a loopback address
  docker exec ${n} ip addr add ${LOOPBACK_PREFIX}${i}/32 dev lo
  # Add static routes
   docker exec ${n} ip route add ${MY_ROUTE} via {$MY_GW}
done
  • But you don't want to add the complexity for that to KIND, because you don't think it's an important enough use case for the Kubernetes community.

Is not just that, KIND is gating kubernetes and is used as CI in a big amount of the Kubernetes ecosystem projects, I'm afraid that the risk of introducing this change could affect the stability of the project, hence all these CIs . You can't imagine the amount of hours that @BenTheElder mainly, @amwat, I and others have spent debugging flakiness and optimizing KIND

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2020
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@nelljerram
Copy link

@aojea

my point is that I don't see the need to implement it in KIND because you can do it just after the cluster creation,

I described in my previous comment why this is not good enough: I need the Kubernetes control plane connections to be using loopback addresses, and IIRC those are set up during cluster creation. Do you think I've got something wrong there?

@aojea
Copy link
Contributor Author

aojea commented Feb 19, 2020

@aojea

my point is that I don't see the need to implement it in KIND because you can do it just after the cluster creation,

I described in my previous comment why this is not good enough: I need the Kubernetes control plane connections to be using loopback addresses, and IIRC those are set up during cluster creation. Do you think I've got something wrong there?

ok, now I got it, sorry for the confusion but I wasn't understanding your point ...

It can be done after the cluster setup, is a bit tricky though.

When creating the cluster add the loopback IP address you are going to use for the control-plane to the certificate SAN (the apiserver binds to "all-interfaces" by default)

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
# add the loopback to apiServer cert SANS
kubeadmConfigPatchesJSON6902:
- group: kubeadm.k8s.io
  kind: ClusterConfiguration
  patch: |
    - op: add
      path: /apiServer/certSANs/-
      value: my-loopback

After the cluster has been created, modify the kube-apiserver --advertise-address flag in /etc/kubernetes/manifests/kube-apiserver.yaml
(is a static pod manifest, once you write the file it restarts the pod with the new config)

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=172.17.0.4

and then change in all the kubelet the node-ip flag

root@kind-worker:/# more /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --fail-swap-on=false --node-ip=172.17.0.4"

and restart them systemctl restart kubelet to use the new config

@aojea
Copy link
Contributor Author

aojea commented Feb 22, 2020

---
# Using KIND to emulate complex network scenarios [Linux Only]

KIND runs Kubernetes cluster in Docker, and leverages Docker networking for all the network features: portmapping, IPv6, containers connectivity, ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...connectivity, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portmapping -> port mapping

valid_lft forever preferred_lft forever
{{< /codeFromInline >}}

Docker also creates iptables NAT rules on the docker host that masquerade the traffic from the containers connected to docker0 bridge to connect to the outside world.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docker host -> the Docker host


## Multiple clusters

As we explained before, all KIND clusters are sharing the same docker network, that means that all the cluster nodes have direct connectivity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker network -> Docker network


{{< /codeFromInline >}}

Then we just need to install the routes obtained from cluterA in each node of clusterB and viceversa:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

viceversa -> vice versa


### Example: Multiple network interfaces and Multi-Home Nodes

There can be scenarios that requite multiple interfaces in the KIND nodes to test multi-homing, VLANS, CNI plugins, ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...CNI plugins, etc.

inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
{{< /codeFromInline >}}

That means that Pods will be able to reach other Docker containers that does not belong to any KIND cluster, however, the Docker container will not be able to answer to the Pod IP address until we install the correspoding routes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correspoding -> corresponding

- --advertise-address=172.17.0.4
```

and then change in all the nodes the kubelet `node-ip` flag:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then change the node-ip flag for the kubelets on all the nodes:

KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --fail-swap-on=false --node-ip=172.17.0.4"
```

and restart them `systemctl restart kubelet` to use the new config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally restart the kubelets to use the new configuration with systemctl restart kubelet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important to note here is that calling kubeadm init / join again on the node will override /var/lib/kubelet/kubeadm-flags.env. alternative is to use /etc/default/kubelet
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add it as a note, due to the ephemeral nature of the nodes I don't expect people to issue those commands ... but 🤷‍♂️


It's important to note that calling `kubeadm init / join` again on the node will override `/var/lib/kubelet/kubeadm-flags.env`. An [alternative is to use /etc/default/kubelet](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd).S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing S after the .

@aojea
Copy link
Contributor Author

aojea commented Feb 29, 2020

/retest
/assign @BenTheElder

inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
{{< /codeFromInline >}}

That means that Pods will be able to reach other Docker containers that does not belong to any KIND cluster, however, the Docker container will not be able to answer to the Pod IP address until we install the corresponding routes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are referring to multiple containers, use do instead of does

@BenTheElder
Copy link
Member

sorry for the immense delay. I'd been hoping to get #148 done faster. I'd still like to hold off detailing networking internals until after I'm doing taking a swing at changing them :D

@aojea
Copy link
Contributor Author

aojea commented Apr 28, 2020

/hold
this need to be updated to match current status, we have custom bridges now and cluster restart 😄

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2020
@aojea aojea marked this pull request as draft June 16, 2020 20:29
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2020
ip route add 10.110.2.0/24 via 172.17.0.2

$kubectl --context kind-clusterB get nodes -o=jsonpath='{range .items[*]}{"ip route add "}{.spec.podCIDR}{" via "}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}'
ip route add 10.120.0.0/24 via 172.17.0.7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be 220? Also why are there three results here when each cluster has two nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, good catch on both things, is 220 and the config should have 3 nodes

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 26, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants